Let’s assume you are an HR manager (with a soft spot for statistics) and you are trying to find out how to best analyse data from a survey about employee satisfaction.
The consultant you hired cam back with a bunch of colorful charts and employee satisfaction numbers that were precise to 3 digits, but you couldn’t stop thinking of you professors rambling that firms would learn more about their employees if they uses the data better and did some multilevel analysis.
You weren’t entirely convinced because you found his argumentation to abstract, but you also were a good hacker and thought now is the time to just simulate some data and see if one is really doing better with a multilevel analysis.
Here are the things you know about your firm:
Here is this expressed in some variables.
n_departments = 15
set.seed(123)
n_employees =
rep(c(5,22,98), each = 5) +
rbinom(n_departments,5,.5)
mean_firm_satisfaction = 3
sd_firm_satisfaction = .25
Here is a simple model of the data generating process, and its translation in simulation code:
| model | simulation code | in words |
|---|---|---|
| \(\mu_d = normal(3, .25)\) |
|
The average satisfaction in departments has a normal distribution with mean_firm_satisfaction = 3 and sd_firm_satisfaction = 0.25 |
| \(S_i = normal(\mu_d, 1)\) |
|
The satisfaction of individual employees varies with sd = 1 around the average satisfaction in departments. |
No we can simulate satisfaction of employees in the firm:
set.seed(1)
department_mus =
rnorm(n_departments,
mean_firm_satisfaction,
sd_firm_satisfaction)
firm_satisfaction = c()
for (d in 1:n_departments) {
employee_satisfaction =
rnorm(n_employees[d],
department_mus[d],
1)
firm_satisfaction = rbind(
firm_satisfaction,
data.frame(
department = as.integer(d),
satisfaction = employee_satisfaction
)
)
}
Here is a brief summary of the data:
dep.satisf =
describeBy(firm_satisfaction$satisfaction,
group = firm_satisfaction$department,
mat = TRUE, skew = FALSE) %>%
.[,-1]
dep.satisf %>%
kable(digits = 1, row.names = FALSE) %>%
kable_styling(full_width = FALSE)
| group1 | vars | n | mean | sd | min | max | range | se |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 7 | 3.4 | 0.4 | 2.8 | 3.8 | 1.0 | 0.2 |
| 2 | 1 | 8 | 2.7 | 0.9 | 1.1 | 3.7 | 2.6 | 0.3 |
| 3 | 1 | 7 | 2.7 | 0.8 | 1.4 | 4.1 | 2.7 | 0.3 |
| 4 | 1 | 9 | 3.5 | 0.7 | 2.7 | 4.5 | 1.8 | 0.2 |
| 5 | 1 | 9 | 3.3 | 0.8 | 2.0 | 4.5 | 2.6 | 0.3 |
| 6 | 1 | 23 | 2.9 | 1.1 | 1.0 | 5.2 | 4.2 | 0.2 |
| 7 | 1 | 25 | 3.1 | 0.9 | 1.6 | 4.7 | 3.1 | 0.2 |
| 8 | 1 | 26 | 3.3 | 0.8 | 2.5 | 5.0 | 2.4 | 0.1 |
| 9 | 1 | 25 | 2.7 | 0.9 | 1.2 | 5.2 | 4.0 | 0.2 |
| 10 | 1 | 24 | 3.2 | 1.1 | 1.4 | 5.2 | 3.8 | 0.2 |
| 11 | 1 | 102 | 3.4 | 1.0 | 0.5 | 6.0 | 5.5 | 0.1 |
| 12 | 1 | 100 | 3.2 | 1.0 | 0.5 | 5.1 | 4.6 | 0.1 |
| 13 | 1 | 101 | 2.8 | 1.1 | -0.2 | 5.0 | 5.2 | 0.1 |
| 14 | 1 | 101 | 2.4 | 1.1 | -0.1 | 6.3 | 6.3 | 0.1 |
| 15 | 1 | 99 | 3.1 | 1.1 | 0.3 | 6.0 | 5.6 | 0.1 |
A standard way to visualize these results is to use bar charts of group means, maybe together with error bars. The figure also shows the true department level satisfaction, which we usually cannot observe, as green stars.
plot_data = function(return.x = FALSE) {
set_par()
se = dep.satisf$se
x = barplot(dep.satisf$mean,
names.arg = dep.satisf$group1,
xlab = "department",
ylab = "satisfaction",
ylim = c(0, max(dep.satisf$mean+1.96*se)))
arrows(x0 = x,
y0 = dep.satisf$mean - 1.96*se,
y1 = dep.satisf$mean + 1.96*se,
length = 0)
abline(h = mean(firm_satisfaction$satisfaction), col = "red", lty = 3)
points(x,department_mus, pch = 8, col = "green3")
if (return.x == TRUE)
return(x)
}
plot_data()
In our simulated world, the true and the measured satisfaction
differ due to random measurement error, as indicated by this line in the
simulation code:
employee_satisfaction = rnorm(n_employees[d], department_mus[d], 1)
How we want to analyse the data depends partially on our assumptions about the data generating process. In the context of multilevel modeling, we can think of three qualitatively different assumptions:
A Bayesian model to estimate identical group means (full pooling) looks as follows:
\[ S_i \sim normal(\bar \mu, \sigma) \\ \bar \mu \sim normal(3,3) \\ \sigma \sim exponential(1) \\ \]
where \(\bar \mu\) is the average satisfaction in the firm.
However, to facilitate the comparison to alternative models, we re-write the model as follows:
| Full pooling |
|---|
| \[\begin{align*} S_i \sim normal(\mu_d, \sigma) & \;\;\;\;{\small \textrm{Individuals satisfaction is department satisfaction plus error}} \\ \mu_d \sim normal(\bar \mu,.0001) & \;\;\;\;{\small \textrm{Department satisfaction is equal to average firm satisfaction}} \\ \bar \mu \sim normal(3,3) & \;\;\;\;{\small \textrm{Prior for average firm satisfaction}} \\ \sigma \sim exponential(1) & \;\;\;\;{\small \textrm{Prior for error variance}} \\ \end{align*}\] |
\(\mu_d\) are department level satisfactions, which depend on the firm level satisfaction \(\bar \mu\) and the variations between departments. By setting standard deviation for the variation between departments to 0.0001, we are enforcing that the department level means will be equal.
Here are the corresponding ulam models:
m.full_pooling = alist(
S ~ dnorm(mu_bar, sigma),
mu_bar ~ dnorm(3,3),
sigma ~ dexp(2)
)
m.full_pooling.b = alist(
S ~ dnorm(mu, sigma),
mu <- mu_bar + z[d]*.0001,
z[d] ~ dnorm(0,1),
mu_bar ~ dnorm(3,3),
sigma ~ dexp(2),
# calculate mu per department
# in generated quantities
gq> vector[d]:mus<<-mu_bar+z*.0001
)
NOTE The second model implements what is called a “non-centered parameterization” of the model. Here are again two formulations of the model:
| Centered | Non-centered |
|---|---|
|
|
Before we fit the model, we put the data into a list:
u.data = list(
S = firm_satisfaction$satisfaction,
d = as.integer(firm_satisfaction$department)
)
Now we estimate the models …
n.cores = ifelse(Sys.info()["sysname"] == "Darwin",4,1)
fit.full_pooling = ulam(
m.full_pooling,
data = u.data,
log_lik = TRUE, iter = 1000,
chains = 4, cores = n.cores)
fit.full_pooling.b = ulam(
m.full_pooling.b,
data = u.data,
log_lik = TRUE, iter = 1000,
chains = 4, cores = n.cores)
and check convergence:
precis(fit.full_pooling) %>%
round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu_bar 3.00 0.04 2.93 3.07 1475.88 1
## sigma 1.06 0.03 1.02 1.11 1448.46 1
precis(fit.full_pooling.b,
depth = 2) %>%
round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## z[1] 0.00 1.01 -1.57 1.65 2754.74 1
## z[2] 0.00 1.04 -1.66 1.64 2563.12 1
## z[3] 0.02 0.98 -1.52 1.63 2240.41 1
## z[4] 0.02 0.94 -1.52 1.47 2573.14 1
## z[5] -0.01 0.99 -1.61 1.54 2618.93 1
## z[6] -0.02 0.98 -1.61 1.57 2387.78 1
## z[7] 0.02 0.98 -1.54 1.59 1947.16 1
## z[8] 0.00 0.98 -1.58 1.54 1970.63 1
## z[9] 0.00 0.98 -1.54 1.56 2560.57 1
## z[10] -0.02 0.98 -1.59 1.63 2683.93 1
## z[11] 0.03 1.00 -1.51 1.62 1947.40 1
## z[12] 0.02 0.97 -1.52 1.61 2407.30 1
## z[13] -0.02 1.01 -1.60 1.53 2050.92 1
## z[14] -0.01 1.00 -1.64 1.53 2152.81 1
## z[15] 0.01 1.05 -1.65 1.68 2519.87 1
## mu_bar 3.00 0.04 2.94 3.06 2293.38 1
## sigma 1.06 0.03 1.01 1.11 3189.93 1
## mus[1] 3.00 0.04 2.94 3.06 2294.97 1
## mus[2] 3.00 0.04 2.94 3.06 2294.36 1
## mus[3] 3.00 0.04 2.94 3.06 2293.57 1
## mus[4] 3.00 0.04 2.94 3.06 2293.68 1
## mus[5] 3.00 0.04 2.94 3.06 2293.83 1
## mus[6] 3.00 0.04 2.94 3.06 2293.44 1
## mus[7] 3.00 0.04 2.94 3.06 2292.56 1
## mus[8] 3.00 0.04 2.94 3.06 2293.30 1
## mus[9] 3.00 0.04 2.94 3.06 2293.02 1
## mus[10] 3.00 0.04 2.94 3.06 2293.27 1
## mus[11] 3.00 0.04 2.94 3.06 2294.59 1
## mus[12] 3.00 0.04 2.94 3.06 2293.82 1
## mus[13] 3.00 0.04 2.94 3.06 2291.77 1
## mus[14] 3.00 0.04 2.94 3.06 2293.53 1
## mus[15] 3.00 0.04 2.94 3.06 2293.14 1
Do the two models really describe the data equally well, and is the number of effective parameters more or less equal?
compare(fit.full_pooling,
fit.full_pooling.b) %>%
round(2)
## WAIC SE dWAIC dSE pWAIC weight
## fit.full_pooling.b 1969.22 36.14 0.00 NA 1.90 0.52
## fit.full_pooling 1969.39 36.13 0.17 0.06 1.99 0.48
Indeed, the difference between the two models is very small. More importantly, we can see that the number of effective parameters of the two versions of the full polling model are 1.99 and 1.9, even though the models have 2 and 17 parameters, respectively.
And here is our initial figure with the estimated average employee satisfactions:
x = plot_data(return.x = TRUE)
est.full_pooling = precis(fit.full_pooling.b, pars = "mus", depth = 2)
points(x,est.full_pooling$mean, col = "blue", pch = 16)
As expected we estimated the same mean for everyone
A Bayesian model to estimate independent group means looks as follows:
| Full pooling | No pooling |
|---|---|
| \[ S_i \sim normal(\mu_d, \sigma) \\ \mu_d \sim normal(\bar \mu,.0001) \\ \bar \mu \sim normal(3,3) \\ \sigma \sim exponential(1)\] | \[S_i \sim normal(\mu_d, \sigma) \\ \mu_d \sim normal(\bar \mu,\color{red}{1000}) \\ \bar \mu \sim normal(3,3) \\ \sigma \sim exponential(1)\] |
In this model, the large standard deviation of the distribution of department level satisfaction (\(\small \mu_d \sim normal(\bar \mu,1000)\)) encodes the assumption of independent groups.
Getting from a full pooling to a no polling model by changing a model parameter shows that this are not qualitatively distinct models, but opposite endpoints of a continuous model space.
The corresponding ulam model looks as follows:
m.no_pooling = alist(
S ~ dnorm(mu, sigma),
mu <- mu_bar + z[d]*1000,
z[d] ~ dnorm(0,1),
mu_bar ~ dnorm(3,3),
sigma ~ dexp(2),
gq> vector[d]:mus<<-mu_bar+z*1000
)
Now we estimate the model …
fit.no_pooling = ulam(
m.no_pooling,
data = u.data,
log_lik = TRUE, iter = 1000,
chains = 4
)
and check convergence:
precis(fit.no_pooling,depth = 2) %>%
round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## z[1] 0.00 0.00 0.00 0.01 152.41 1.03
## z[2] 0.00 0.00 -0.01 0.00 151.49 1.03
## z[3] 0.00 0.00 -0.01 0.00 150.10 1.03
## z[4] 0.00 0.00 0.00 0.01 150.37 1.03
## z[5] 0.00 0.00 0.00 0.01 151.15 1.03
## z[6] 0.00 0.00 0.00 0.00 148.87 1.03
## z[7] 0.00 0.00 0.00 0.00 149.98 1.03
## z[8] 0.00 0.00 0.00 0.01 149.66 1.03
## z[9] 0.00 0.00 -0.01 0.00 149.63 1.03
## z[10] 0.00 0.00 0.00 0.00 149.65 1.03
## z[11] 0.00 0.00 0.00 0.01 148.70 1.03
## z[12] 0.00 0.00 0.00 0.00 148.26 1.03
## z[13] 0.00 0.00 -0.01 0.00 149.20 1.03
## z[14] 0.00 0.00 -0.01 0.00 149.02 1.03
## z[15] 0.00 0.00 0.00 0.00 147.86 1.03
## mu_bar 2.90 3.03 -1.74 7.85 148.50 1.03
## sigma 1.02 0.03 0.98 1.06 281.88 1.01
## mus[1] 3.40 0.38 2.79 4.02 1873.91 1.00
## mus[2] 2.66 0.35 2.11 3.21 2177.09 1.00
## mus[3] 2.71 0.37 2.09 3.29 1886.23 1.00
## mus[4] 3.54 0.33 3.04 4.08 2353.43 1.00
## mus[5] 3.34 0.34 2.77 3.87 2378.79 1.00
## mus[6] 2.94 0.21 2.61 3.28 2801.84 1.00
## mus[7] 3.14 0.21 2.81 3.47 2532.67 1.00
## mus[8] 3.34 0.21 3.01 3.67 2707.75 1.00
## mus[9] 2.70 0.21 2.37 3.03 2495.72 1.00
## mus[10] 3.15 0.21 2.83 3.49 2338.38 1.00
## mus[11] 3.37 0.10 3.21 3.53 1816.83 1.00
## mus[12] 3.16 0.10 3.00 3.32 2313.76 1.00
## mus[13] 2.81 0.10 2.65 2.97 2166.62 1.00
## mus[14] 2.40 0.10 2.24 2.57 2077.21 1.00
## mus[15] 3.14 0.10 2.98 3.31 1960.21 1.00
Now lets look at the true and estimated satisfaction in departments, where the estimated satisfaction from the no-pooling model is shown as blue dots:
x = plot_data(return.x = TRUE)
est.no_pooling = precis(fit.no_pooling, pars = "mus", depth = 2)
points(x,est.no_pooling$mean, col = "blue", pch = 16)
As expected, the estimates of this simple, no-pooling, Bayesian model correspond to the averages we calculated earlier, except that we see a bit of shrinkage towards the prior mean for the small departments (1-5).
In the full and no pooling analysis, we determined the the degree of pooling by setting the standard deviation for the department level satisfaction to a very low or very high number, respectively.
The key advantage of partial polling models is that instead of setting this standard deviation to a fixed value, they estimate it as a parameter from the data. This allows such models to adapt to situations where the different groups are either similar (low standard deviation) or dissimilar (high standard deviation).
| Full pooling | No pooling | Partial pooling |
|---|---|---|
|
\[ S_i \sim normal(\mu_d, \sigma) \\ \mu_d \sim normal(\bar \mu,.0001) \\ \bar \mu \sim normal(3,3) \\ \sigma \sim exponential(1)\] |
\[S_i \sim normal(\mu_d, \sigma) \\ \mu_d \sim normal(\bar \mu,\color{red}{1000}) \\ \bar \mu \sim normal(3,3) \\ \sigma \sim exponential(1)\] |
\[S_i \sim normal(\mu_d, \sigma)\\ {\mu_d} \sim normal(\bar \mu,\color{red}{\Sigma}) \\ \sigma \sim exponential(1) \\ \bar \mu \sim normal(3,3) \\ \color{red}{\Sigma \sim exponential(0,.5)}\] |
Here is the partial pooling ulam model:
m.partial_pooling = alist(
S ~ dnorm(mu, sigma),
mu <- mu_bar + z[d] * Sigma,
mu_bar ~ dnorm(3,3),
sigma ~ dexp(1),
z[d] ~ dnorm(0,1),
Sigma ~ dhalfnorm(0,2),
gq> vector[d]:mus<<-mu_bar+z*Sigma
)
Now we estimate the model …
fit.partial_pooling = ulam(
m.partial_pooling,
data = u.data,
log_lik = TRUE, iter = 1000,
chains = 4
)
precis(fit.partial_pooling,depth = 2) %>%
round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## mu_bar 3.04 0.09 2.89 3.20 623.24 1.00
## sigma 1.02 0.03 0.98 1.06 2249.05 1.00
## z[1] 0.45 0.78 -0.79 1.71 2040.26 1.00
## z[2] -0.53 0.76 -1.72 0.67 2311.23 1.00
## z[3] -0.39 0.81 -1.68 0.92 2419.21 1.00
## z[4] 0.71 0.73 -0.44 1.86 2026.16 1.00
## z[5] 0.43 0.75 -0.75 1.65 2557.04 1.00
## z[6] -0.20 0.62 -1.21 0.76 1418.61 1.00
## z[7] 0.22 0.59 -0.69 1.20 1661.63 1.00
## z[8] 0.67 0.61 -0.30 1.70 1272.08 1.00
## z[9] -0.73 0.62 -1.73 0.24 1464.15 1.00
## z[10] 0.26 0.60 -0.70 1.20 1716.11 1.00
## z[11] 0.98 0.48 0.28 1.77 986.03 1.00
## z[12] 0.36 0.44 -0.31 1.06 1061.16 1.00
## z[13] -0.70 0.43 -1.38 -0.03 938.84 1.00
## z[14] -1.92 0.57 -2.88 -1.06 778.76 1.00
## z[15] 0.30 0.43 -0.37 1.00 1095.08 1.00
## Sigma 0.31 0.09 0.20 0.47 616.42 1.01
## mus[1] 3.19 0.25 2.81 3.60 2239.54 1.00
## mus[2] 2.88 0.23 2.50 3.24 2537.76 1.00
## mus[3] 2.92 0.25 2.51 3.31 2327.49 1.00
## mus[4] 3.26 0.23 2.91 3.66 1749.68 1.00
## mus[5] 3.17 0.23 2.81 3.57 2768.52 1.00
## mus[6] 2.98 0.18 2.70 3.27 2724.51 1.00
## mus[7] 3.11 0.16 2.85 3.37 3126.59 1.00
## mus[8] 3.24 0.17 2.98 3.52 3420.75 1.00
## mus[9] 2.82 0.17 2.53 3.09 2468.87 1.00
## mus[10] 3.12 0.17 2.84 3.39 2723.14 1.00
## mus[11] 3.33 0.10 3.18 3.49 2410.32 1.00
## mus[12] 3.15 0.10 2.99 3.29 2975.65 1.00
## mus[13] 2.83 0.09 2.68 2.98 3109.48 1.00
## mus[14] 2.47 0.10 2.31 2.63 2101.18 1.00
## mus[15] 3.13 0.10 2.97 3.28 3610.49 1.00
Let’s see if the model detected some variability in department level satisfaction:
post.Sigma = extract.samples(fit.partial_pooling)$Sigma
hist(post.Sigma, xlim = c(0, max(post.Sigma)))
abline(v = sd_firm_satisfaction, lty = 2, lwd = 2, col = "green3")
The vertical green line is the true standard deviation, which shows that the model accurately estimates the variability of departments.
Models in which we estimate the between group standard deviation are called partial pooling models, because they lie somewhere on the continuum between no and full polling.
Now lets look at the estimated means:
x = plot_data(return.x = TRUE)
est.partial_pooling = precis(fit.partial_pooling, pars = "mus", depth = 2)
points(x,est.partial_pooling$mean, col = "blue", pch = 16)
To see which method provides the best estimates of department level satisfaction, we compare each with the true values:
abs.delta = rbind(
no_pooling = est.no_pooling$mean - department_mus,
full_pooling = est.full_pooling$mean - department_mus,
partial_pooling = est.partial_pooling$mean - department_mus
)
Here is a plot of these deviations:
abs.delta = abs.delta %>% abs() %>% t()
matplot(abs.delta, pch = 16, col = 2:4)
legend("topleft", bty = "n",
title = "Pooling",
col = 2:4,
pch = 16,
legend = gsub("_pooling","",colnames(abs.delta)))
It seems as if the partial pooling model is doing best, but it is not 100% clear.
A standard metric to calculate the performance of different models is the root means square deviation (RMSD), which we calculate now:
abs.delta^2 %>% # square deviation
colMeans() %>% # mean
sqrt() # root
## no_pooling full_pooling partial_pooling
## 0.2415329 0.2468614 0.1635485
Indeed, the partial pooling model as the smallest error. This is due to the same property of shrinkage estimators we already observed earlier in the course: Shrinkage estimators fit the data not as well as other models (here the no pooling model) but they are better at out of sample prediction.
Note though, that this is not a big surprise, because this is how we specified the data generating process.
Here is a function that produces the same plot we just saw, and that takes the between department standard deviation as an input. This allows us to test if the partial pooling model does well, even if a full or now pooling model is the true DGP.
check.accuracy = function(sd_firm_satisfaction = .25, n_departments = 15) {
fn = paste0("accuracy_",100*sd_firm_satisfaction,"_",n_departments,".Rdata")
n_employees =
rep(c(5,22,98), each = n_departments/3) +
rbinom(n_departments,n_departments/3,.5)
if (file.exists(fn)) {
load(fn)
} else {
# simulate data
department_mus =
rnorm(n_departments,
mean_firm_satisfaction,
sd_firm_satisfaction)
firm_satisfaction = c()
for (d in 1:n_departments) {
employee_satisfaction =
rnorm(n_employees[d],
department_mus[d],
1)
firm_satisfaction = rbind(
firm_satisfaction,
data.frame(
department = as.integer(d),
satisfaction = employee_satisfaction
)
)
}
#prepare ulam.data
u.data = list(
N = nrow(firm_satisfaction),
S = firm_satisfaction$satisfaction,
d = as.integer(firm_satisfaction$department))
# fit.models
fit.full_pooling = ulam(m.full_pooling.b,data = u.data,chains = 4, cores = n.cores)
fit.no_pooling = ulam(m.no_pooling,data = u.data,chains = 4, cores = n.cores)
fit.partial_pooling = ulam(m.partial_pooling,data = u.data,chains = 4, cores = n.cores)
save(fit.full_pooling,
fit.no_pooling,
fit.partial_pooling,
department_mus,u.data,
firm_satisfaction,
file = fn)
}
# get estimates
est.full_pooling = precis(fit.full_pooling, pars = "mus", depth = 2)
est.no_pooling = precis(fit.no_pooling, pars = "mus", depth = 2)
est.partial_pooling = precis(fit.partial_pooling, pars = "mus", depth = 2)
abs.delta = rbind(
no_pooling = est.no_pooling$mean - department_mus,
full_pooling = est.full_pooling$mean - department_mus,
partial_pooling = est.partial_pooling$mean - department_mus
) %>% abs() %>% t()
layout(matrix(c(1,1,2,3),ncol = 2))
RMSD = abs.delta^2 %>% colMeans() %>% sqrt()
matplot(abs.delta, pch = 16, col = 2:4, ylab = "RMSD",
main = paste0("sd = ", sd_firm_satisfaction, ", ",n_departments," groups"))
legend("topleft", bty = "n",
title = "Pooling",
col = 2:4,
pch = 16,
legend = gsub("_pooling","",colnames(abs.delta)))
barplot(RMSD, col = 2:4)
post.Sigma = extract.samples(fit.partial_pooling)$Sigma
hist(post.Sigma, xlim = c(0, max(post.Sigma)), probability = T)
curve(dnorm(x,0,2), add = T, col = "blue")
abline(v = 0, lty = 2, col = "red")
abline(v = sd_firm_satisfaction, lty = 2, lwd = 2, col = "green3")
}
Here is the performance of the three models if there is no variation between groups/departments.
set_par(cex = 1, mar=c(3,3,3,.5))
check.accuracy(sd_firm_satisfaction = .01, n_departments = 15)
Here is the performance of the three models if there is some variation between groups/departments.
set_par(cex = 1, mar=c(3,3,3,.5))
check.accuracy(sd_firm_satisfaction = 1, n_departments = 15)
Here is the performance of the three models if there is a lot of variation between groups/departments.
set_par(cex = 1, mar=c(3,3,3,.5))
check.accuracy(sd_firm_satisfaction = 10, n_departments = 15)
The partial pooling model is generally nearly as accurate as the
“true” model. Obviously, this also depends on how much data is
available! The last figures are based on a large number (30) of groups,
but if the number of groups becomes small and each group has only few
members, a partial pooling model will have difficulties estimating the
standard deviation for the between group variation.
One issue with multilevel models (with partial pooling) is that they are not “unbiased”, which means that they do not provide an estimate of the sample mean.
Lets look at the means from the no pooling an partial pooling models, together with the sample means:
sample_means =
describeBy(firm_satisfaction$satisfaction,
group = firm_satisfaction$department,
mat = TRUE)[,"mean"]
set_par()
plot(1:15, sample_means, ylab = "mean", xlab = "department", cex = 1.5)
points(1:15, est.no_pooling$mean, pch = 16, col = 2)
points(1:15, est.partial_pooling$mean, pch = 16, col = 4)
abline(h = mean(u.data$S), lty = 3)
arrows(x0 = 1:15,
y0 = est.no_pooling$mean,
y1 = est.partial_pooling$mean,
col = 4, lty = 3, length = .1)
legend("topright",
pch = c(1,16,16),
col = c(1,2,4), bty = "n",
legend = c("dep.mean","no pooling", "partial pooling"))
The deviation between sample mean and partial pooling estimate shows
that the latter is a biased estimate of the sample mean. This bias is
larger for smaller departments (on the left hand side) which which are
shrunk more, and it is also larger for departments with extreme means,
compare e.g. departments 13 and 14.
Now one could be concerned that partial pooling makes it difficult to detect differences between groups. But the bias just described is only part of the story. Next, we plot the estimates together with their credible intervals:
set_par()
ylim = range(c(est.no_pooling[,3:4],est.partial_pooling[, 3:4]))
plot(1:15, sample_means, ylab = "mean", xlab = "department", cex = 1.5, ylim = ylim)
points((1:15)-.1, est.no_pooling$mean, pch = 16, col = 2)
points((1:15)+.1, est.partial_pooling$mean, pch = 16, col = 4)
abline(h = mean(u.data$S), lty = 3)
arrows(x0 = (1:15)-.1,
y0 = est.no_pooling$`5.5%`,
y1 = est.no_pooling$`94.5%`,
col = 2, length = 0)
arrows(x0 = (1:15)+.1,
y0 = est.partial_pooling$`5.5%`,
y1 = est.partial_pooling$`94.5%`,
col = 4, length = 0)
legend("topright",
pch = c(1,16,16),
col = c(1,2,4), bty = "n",
legend = c("dep.mean","no pooling", "partial pooling"))
What do you see?
The variance of the estimates from the hierarchical partial pooling model is generally smaller. The intuition her is that because the estimate for one department also relies partially on data from other departments, the estimated mean for the department was derived from more data and is therefore less uncertain.
This reduction in bias means that even though all estimates were shrunk towards the global mean and group differences became smaller, we can still detect group differences well due to the reduces variance of the estimates for the group mean.
More generally, it can be said that total error of an estimate can be decomposed into:
VB = function(truth, estimate) {
return(c(
variance = mean((estimate-mean(estimate))^2),
bias = mean((truth-mean(estimate))^2)
))
}
Shrinkage methods and in particular multilevel regression and partial pooling make a different trade off than the more traditional no-pooling estimate by exchanging some bias for a reduction in variance.
The following figure shows how partial pooling trades off variance for bias:
VB.partial_pooling =
do.call(rbind,lapply(1:15, function(d) {
VB(sample_means[d],extract_post_ulam(fit.partial_pooling,pars = "mus")[[1]][,d])
}))
VB.no_pooling =
do.call(rbind,lapply(1:15, function(d) {
VB(sample_means[d],extract_post_ulam(fit.no_pooling,pars = "mus")[[1]][,d])
}))
set_par()
xlim = range(c(VB.partial_pooling[,1],VB.no_pooling[,1]))
ylim = range(c(VB.partial_pooling[,2],VB.no_pooling[,2]))
plot(0, type = "n", col = 4,
ylim = ylim, xlim = xlim,
ylab = "Bias", xlab = "Variance")
arrows(x0 = VB.no_pooling[,1],
y0 = VB.no_pooling[,2],
x1 = VB.partial_pooling[,1],
y1 = VB.partial_pooling[,2],
length = .1, col = "grey")
text(VB.partial_pooling[,1], VB.partial_pooling[,2], 1:15, col = 4)
text(VB.no_pooling[,1], VB.no_pooling[,2], 1:15, col = 2)
legend("topright", bty = "n",
col = c(2,4), pch = 16,
legend = c("no pooling","partial pooling"))